ggml-virtgpu: make the code thread safe by kpouget · Pull Request #19204 · ggml-org/llama.cpp

kpouget · 2026-01-30T10:52:25Z

This PR improves the code of the ggml-virtgpu backend to make it thread safe, by using mutex for accessing the host<>guest shared memory buffers, and by pre-caching, during the initialization, the constant values queried from the backend.

The unused buffer_type_is_host method is also deprecated.

…function

not necessary

The static init isn't thread safe.

… memory

…sing

taronaeo

I see that some of the GGML_ABORT statements do not include __func__. Would it be better to include them for debugging and error tracing in the future? :)

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp

kpouget · 2026-01-30T12:59:29Z

thanks for the feedback, I followed them.

I see that some of the GGML_ABORT statements do not include __func__. Would it be better to include them for debugging and error tracing in the future? :)

I'm not sure about this one, that's not a common pattern in the code base,

$ gg GGML_ABORT | grep __func__ | wc -l
37 # including it
$ gg GGML_ABORT | grep -v __func__ | wc -l
446 # not including it

and when I hit an abort, I get automatically get stack trace:

ggml/src/ggml-virtgpu/virtgpu-common.cpp:19: calling abort
bin/libggml-base.so.0(+0x75ce) [0x720ce69285ce]
bin/libggml-base.so.0(ggml_print_backtrace+0x25f) [0x720ce692884c]
bin/libggml-base.so.0(ggml_abort+0x160) [0x720ce6928a1b]
libggml-virtgpu.so.0(+0x8be5) [0x720ce6a14be5]
libggml-virtgpu.so.0(+0x8c08) [0x720ce6a14c08]

so I don't think it's necessary, do you?

taronaeo · 2026-01-30T13:40:33Z

so I don't think it's necessary, do you?

I usually do it for bug reporting purposes so it's easier to identify which line of code is failing, and the order of sequence it happened. But it's just a suggestion, feel free to ignore it if we aren't expecting X specific lines of GGML_ABORT to hit/fail often haha

Edit: Also another thought. Most users are end-consumers who do not compile from source and rather, use a pre-built release binary. IIRC, I may be wrong, release builds do not show full backtrace information on the failing lines of code leading to the abort, or may have just been optimized out; much harder to debug.

kpouget · 2026-01-30T14:18:26Z

Edit: Also another thought. Most users are end-consumers who do not compile from source and rather, use a pre-built release binary. IIRC, I may be wrong, release builds do not show full backtrace information on the failing lines of code leading to the abort, or may have just been optimized out; much harder to debug.

good point, I missed that
but I was thinking about it as well, and I think what matters is that the message is unique, so that you can locate it easily when you get user log trace.

anyway, I'll think about it, I want to improve the error message when running in an unsupported environment (no virtgpu, this one should be good, but unpatched virglrenderer, this one can be improved I guess)

kpouget · 2026-02-03T10:05:45Z

I've updated the code with the cleaner logging, and I finally followed your suggestion with __func__. ABORT/ERROR/WARNING include now it (not INFO, on purpose)
and they all include the ggml-virtgpu: prefix for clarity. Makes the logging more readable.

I also reworked the abort on init, so that it doesn't abort if the virtgpu isn't detected

ggml/src/ggml-virtgpu/backend/backend.cpp

ggml/src/ggml-virtgpu/virtgpu.cpp

taronaeo

Minor formatting changes :)

kpouget · 2026-02-03T16:05:36Z

good catches, thanks, fixed as suggested :)

taronaeo · 2026-02-03T22:19:00Z

Great! Wait for CI to go green and we merge.

taronaeo · 2026-02-04T02:46:04Z

CI / ggml-ci-x64-nvidia-vulkan-cm (pull_request) failure is unrelated to this PR. Merging.

* ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a function * ggml-virtgpu: deprecate buffer_type is_host remoting not necessary * ggml-virtgpu: stop using static vars as cache The static init isn't thread safe. * ggml-virtgpu: protect the use of the shared memory to transfer data * ggml-virtgpu: make the remote calls thread-safe * ggml-virtgpu: backend: don't continue if couldn't allocate the tensor memory * ggml-virtgpu: add a cleanup function for consistency * ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is missing * fix style and ordering * Remove the static variable in apir_device_get_count * ggml-virtgpu: improve the logging * fix review minor formatting changes

kpouget added 8 commits January 30, 2026 11:48

ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a …

e9d130a

…function

ggml-virtgpu: deprecate buffer_type is_host remoting

c6a6085

not necessary

ggml-virtgpu: stop using static vars as cache

92390ad

The static init isn't thread safe.

ggml-virtgpu: protect the use of the shared memory to transfer data

fcc6890

ggml-virtgpu: make the remote calls thread-safe

171ab8b

ggml-virtgpu: backend: don't continue if couldn't allocate the tensor…

3864be3

… memory

ggml-virtgpu: add a cleanup function for consistency

07f41ca

ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is mis…

f35b24e

…sing

github-actions bot added python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jan 30, 2026

taronaeo approved these changes Jan 30, 2026

View reviewed changes

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp Show resolved Hide resolved

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp Outdated Show resolved Hide resolved

fix style and ordering

f978082

Remove the static variable in apir_device_get_count

e9e9d73

loci-dev mentioned this pull request Jan 31, 2026

UPSTREAM PR #19204: ggml-virtgpu: make the code thread safe auroralabs-loci/llama.cpp#1094

Open

kpouget requested a review from ggerganov as a code owner February 3, 2026 09:47

kpouget force-pushed the leaks branch from b8fc472 to b4b375b Compare February 3, 2026 09:49

ggml-virtgpu: improve the logging

5cbf046

kpouget force-pushed the leaks branch from b4b375b to 5cbf046 Compare February 3, 2026 10:00